An Adaptive Partitional Clustering Method for Categorical Attribute Using K-medoid
نویسنده
چکیده
Abstract— partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The operation is needed in a number of data mining tasks such as unsupervised classification and data summation as well as segmentation of large heterogeneous data sets into smaller homogeneous subsets that can be easily managed, separately modeled and analyzed. Clustering is a popular approach used to implement this operation. Partitional clustering attempts to directly decompose the data set into a set of disjoint clusters. More specifically, they attempt to determine an integer number of partitions that optimize as certain criterion function. The criterion function may emphasize the local or global structure of the data and its optimization is an iterative procedure. The intention to analyze the fact that partitional clustering algorithms performs efficiently for numerical attribute rather than categorical attribute. To analyze the algorithm best suits for a matrix data. They work with larger datasets with many attributes. For analysis the Iris dataset has been retrieved from UCI data repository and used in K-Medoid. The outcome of the algorithm is the partition of clusters which can also be visualized in graphical format. The cluster figures differentiate the cluster in various colors with the centroid measure distinctly. Finally it has been determined that K-Medoid is the better partitional algorithm.
منابع مشابه
Genetic Algorithms in Partitional Clustering: A Comparison
Three approaches to partitional clustering using genetic algorithms (GA) are compared with k-means and the EM algorithm for three real world datasets (Iris, Glass and Vowel). The GA techniques differ in their encoding of the clustering problem using either a class id for each object (GAIE), medoids to assign objects to the class associated with the nearest medoid (GAME), or parameters for multi...
متن کاملComputation of Initial Modes for K-modes Clustering Algorithm Using Evidence Accumulation
Clustering accuracy of partitional clustering algorithm for categorical data depends primarily on the choice of initial data points to instigate the clustering process and hence the clustering results cannot be generated and repeated consistently. In this paper we present an approach to compute initial modes for K-mode partitional clustering algorithm to cluster categorical data sets. Here we u...
متن کاملA cluster centers initialization method for clustering categorical data
Keywords: The k-modes algorithm Initialization method Initial cluster centers Density Distance a b s t r a c t The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, the performance of the k-modes clustering algorithm which converges to numerous local minima strongly depends on initial cluster centers...
متن کاملDifferential evolution and particle swarm optimisation in partitional clustering
In recent years, many partitional clustering algorithms based on genetic algorithms (GA) have been proposed to tackle the problem of finding the optimal partition of a data set. Surprisingly, very few studies considered alternative stochastic search heuristics other than GAs or simulated annealing. Two promising algorithms for numerical optimization, which are hardly known outside the heuristic...
متن کاملContext-Based Distance Learning for Categorical Data Clustering
Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of the same categorical attribute, since they are not ordered. In this paper, we propose a method to learn a context-based distance for categorical attributes. The key intuition of this work is that the d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013